Notes here
don’t forget to create a new library before starting your SAS program or it will disappear after you close SAS.
Whenever you make some notes, use “* … ;”
Import files
To import CSV file: general version
PROC IMPORT DATAFILE = 'path\xx.csv' OUT=LIB.xxx ;
DBMS=CSV;
REPLACE;
RUN;
If for your dataset, some variables missing at the beginning, then use guessingrows:
proc import file='path\patents.csv'
guessingrows=3000000; *approximate number;
out=lib.name dbms=csv;
run;
merge tables
inner join
proc sql;
create table lib.name as
select a.xxx, a.xxx as xxx, b.* from lib1.name as a, lib2.name as b where
input(a.PatentNo,7.)=b.patentno;
quit;
Above case show when variable Patentno is a string while b.patentno is a number, convert a.Patentno from string to number. (the length of patentno is 7)
left join
proc sql;
create table lib.name as
select a.*, b.xxx, b.xx from lib1.xxx as a
left join lib2.xxx as b
on a.xxx=b.xxx;
quit;
notes
to merge table, proc sql is more flexible than merge
copy dataset
Data lib.newname;
set lib.name;
run;
convert some variable from string to number
data lib.newname;
set lib.newname(rename=(var=var1));
var = put(var1, 7.);
drop var1;
run;
drop unuseful info
drop variable
Data lib.newname(drop=xx xx);
set lib.name;
run;
or:
data lib.newname;
set lib.name;
drop xxx;
run;
keep variable you want
Data lib.name(keep=xxx);
set lib.name;
run;
drop duplicates
proc sort data= lib.name nodupkey
out= lib.newname;
by xxx(variable);
run;
drop missing data
data lib.name;
set lib.name;
if cmiss(xxx) then delete;
run;
Data lib.name;
Set lib.name;
if xxx=" " then delete;
run;
sort data
PROC SORT data=linkaf10.assign;
BY PatentNo;
RUN;
deal with dates in SAS
keep obs after 2010
Data lib.name;
set lib.name;
year= year(xxx);
run;
Data lib.name;
set lib.name;
if year > 2010 then delete;
run;
standardize variables
change to uppercase
data lib.name;
set lib.name;
newvar = upcase(var);
run;
delete all the punctuation characters
data lib.name;
set lib.name;
var1=compress(var, ,'pd')
run;
data lib.name;
set lib.name;
rename var1 var;
run;
create distinct identification(number)
data lib.name;
set lib.name;
retain var1 2000; *start from 2000;
var+1;
run;
export file
proc export data=lib.xxx
outfile= "path\xxx.csv"
dbms=csv replace;
run;